Search CORE

19 research outputs found

Determination of traffic control tables by HPC

Author: Haijema R.
Hendrix E.M.T.
Tabik S.
Publication venue
Publication date
Field of study

Wageningen University & Research Publications

TOMOBFLOW: feature-preserving noise filtering for electron tomography

Author: A Geissler
AE Yakushevska
AP Leis
AS Frangakis
BF McEwen
C Messaoudi
D Uttenweiler
H Scharr
I Rouiller
J Fontana
J Weickert
JA Briggs
JB Heymann
JB Heymann
JJ Fernandez
JJ Fernandez
JJ Fernandez
JJ Fernandez
JJ Fernandez
Jose-Jesus Fernandez
M Barcena
M Cyrklaff
M Tagari
ME Martone
Pvan der Heide
R Kimmel
R Kimmel
RC Gonzalez
S Pruggnaller
S Tabik
V Lucic
Y Wang
Publication venue: BioMed Central
Publication date: 01/06/2009
Field of study

Abstract Background Noise filtering techniques are needed in electron tomography to allow proper interpretation of datasets. The standard linear filtering techniques are characterized by a tradeoff between the amount of reduced noise and the blurring of the features of interest. On the other hand, sophisticated anisotropic nonlinear filtering techniques allow noise reduction with good preservation of structures. However, these techniques are computationally intensive and are difficult to be tuned to the problem at hand. Results TOMOBFLOW is a program for noise filtering with capabilities of preservation of biologically relevant information. It is an efficient implementation of the Beltrami flow, a nonlinear filtering method that locally tunes the strength of the smoothing according to an edge indicator based on geometry properties. The fact that this method does not have free parameters hard to be tuned makes TOMOBFLOW a user-friendly filtering program equipped with the power of diffusion-based filtering methods. Furthermore, TOMOBFLOW is provided with abilities to deal with different types and formats of images in order to make it useful for electron tomography in particular and bioimaging in general. Conclusion TOMOBFLOW allows efficient noise filtering of bioimaging datasets with preservation of the features of interest, thereby yielding data better suited for post-processing, visualization and interpretation. It is available at the web site <url>http://www.ual.es/%7ejjfdez/SW/tomobflow.html</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital.CSIC

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems

Author: A Pieper
A Pieper
A Stathopoulos
Achim Basermann
Andreas Pieper
C Augonnet
C Chevalier
CG Baker
DP O’Leary
E Chow
E Polizzi
Faisal Shahzad
G Hager
G Schofield
G Schubert
Georg Hager
Gerhard Wellein
GW Stewart
Holger Fehske
Jonas Thies
LS Blackford
M Galgon
M Kreutzer
M Röhrig-Zöllner
Martin Galgon
Melven Röhrig-Zöllner
Moritz Kreutzer
P Ghysels
R Lehoucq
S Kaczmarz
S Tabik
S Williams
TC Oppe
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A tuning approach for iterative multiple 3d stencil pipeline on GPUs: Anisotropic Nonlinear Diffusion algorithm as case study

Author: Peemen M.
Romero L. F.
Tabik S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2018
Field of study

This paper focuses on challenging applications that can be expressed as an iterative pipeline of multiple 3d stencil stages and explores their optimization space on GPUs. For this study, we selected a representative example from the field of digital signal processing, the Anisotropic Nonlinear Diffusion algorithm. An open issue to these applications is to determine the optimal fission/fusion level of the involved stages and whether that combination benefits from data tiling. This implies exploring a large space of all the possible fission/fusion combinations with and without tiling, thus making the process non-trivial. This study provides insights to reduce the optimization tuning space and programming effort of iterative multiple 3d stencils. Our results demonstrate that all combinations that fuse the bottleneck stencil with high halos update cost (> 25 % , this percentage can be measured or estimated experimentally for each single stencil) and high registers and shared memory accesses must not be considered in the exploration process. The optimal fission/fusion combination is up to 1.65× faster than the case in which we fully decompose our stencil without tiling and 5.3× faster with respect to the fully fused version on the NVIDIA GPUs

Pure OAI Repository

Minisymposium - BioinformaticsImplementation of Anisotropic Nonlinear Diffusion for Filtering 3D Images in Structural Biology on SMP Clusters

Author: Fernández J. J.
García I.
Garzón E. M.
Tabik S.
Publication venue: John von Neumann Institute for Computing
Publication date: 01/01/2006
Field of study

Juelich Shared Electronic Resources

Demystifying the 16 × 16 thread-block for stencils on the GPU

Author: Corporaal H Henk
Guil N
Peemen MCJ Maurice
Tabik S Siham
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/01/2015
Field of study

\u3cp\u3eSummary Stencil computation is of paramount importance in many fields, in image processing, structural biology and biomedicine, among others. There exists a permanent demand of maximizing the performance of stencils on state-of-the-art architectures, such graphics processing units (GPUS). One of the important issues when optimizing these kernels for the GPU is the selection of the best thread-block that maximizes the overall performance. Usually, programmers look for the optimal thread-block configuration in a reduced space of square thread-block configurations or simply use the best configurations reported in previous works, which is usually 16 × 16. This paper provides a better understanding of the impact of thread-block configurations on the performance of stencils on the GPU. In particular, we model locality and parallelism and consider that the optimal configurations are within the space that provides: (1) a small number of global memory communications; (2) a good shared memory utilization with small numbers of conflicts; (3) a good streaming multi-processors utilization; and (4) a high efficiency of the threads within a thread-block. The model determines the set of optimal thread-block configurations without the need of executing the code. We validate the proposed model using six stencils with different halo widths and show that it reduces the optimization space to around 25% of the total valid space. The configurations in this space achieve at least a throughput of 75% of the best configuration and guarantee the inclusion of the best configurations.\u3c/p\u3

Repository TU/e

Pure OAI Repository